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The goal of this paper is to review recent research on copy number variations (CNVs) and 
their association with complex and rare diseases. In the latter part of this paper, we focus on 
how large biorepositories such as the electronic medical record and genomics (eMERGE) 
consortium may be best leveraged to systematically mine for potentially pathogenic CNVs, 
and we end with a discussion of how such variants might be reported back for inclusion in 
electronic medical records as part of medical history. 
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WHAT ARE COPY NUMBER VARIATIONS? 

Copy number variations (CNVs) are deletions and duplications 
in the genome that vary in length from ~50 base pairs to many 
megabases (50 base pair to 1 kilobase CNVs are typically consid- 
ered indels). Events that cause CNVs include non-allelic homolo- 
gous recombination, non-homologous end-joining, transposition 
of transposable elements, transposition of pseudogenes, vari- 
able numbers of tandem repeats, and replication errors following 
template-switching or fork stalling. CNVs are the primary mode 
by which an individual acquires a mutation, and occur at a rate of 
approximately 1.7 x 10~ 6 per locus as opposed to 1.8 x for 
sequence variation (Lupski, 2007). Estimates of CNV frequency 
vary depending on the size of the structural variation classed as 
CNV - some estimates suggest that up to 12% of the genome 
may be variable in copy number, and that the cumulative result 
of CNV inheritance may constitute more than 10% of the human 
genome (Carter, 2007; Lupski etal., 2010). Recent studies suggest 
that the average human genome contains >1000 CNVs, cover- 
ing approximately four million base pairs (Conrad etal., 2010; 
Mills etal., 2011), and occur at a rate of 0.07-0.12 per generation 
(Cordaux and Batzer, 2009; Itsara etal, 2010; Beck etal, 2011; 
Malhotra and Sebat, 2012). The Database of Genomic Variation 
(DGV) 1 currently lists over 100,000 published, unique, CNVs 
across the genome. While the majority continues to be benign, 
an increasing number of CNVs have been associated with dis- 
ease susceptibility. Common functional consequences of CNVs 
typically demonstrate gene dose effect and include truncated pro- 
tein sequences, eliminated/reduced protein expression (typically 



1 http :// dgv.tcag.ca/dgv/ app/home 



the result of deletions), or increased protein expression (typically 
caused by duplications). 

HOW ARE COPY NUMBER VARIATIONS IDENTIFIED? 
ARRAY-BASED APPROACHES 

A range of approaches are available for detecting CNVs (Figure 1). 
The most common methods rely on computational methods, 
which leverage signals from genotyping and sequencing to infer 
CNVs. For example, large chromosomal anomalies can be detected 
through log R ratio (LRR) and B-allele frequency (BAF), data 
routinely generated and provided with single nucleotide poly- 
morphism (SNP) and exome microarrays (e.g., Figure 2). For 
replication and validation, quantitative PCR - which compares 
the threshold cycles of a target versus reference sequence - 
is still widely deployed. In a similar vein, paralogs-ratio test- 
ing and molecular copy number counting are also used for 
validation. 

For high-throughput CNV detection, the most common plat- 
forms are genome hybridization (CGH) arrays, genome-wide 
association (GWA) arrays, and second-generation sequencing 
(SGS). CGH arrays use artificial bacterial chromosomes or long 
synthetic oligonucleotides to probe either specific regions of inter- 
est or the entire genome (Greshock et al, 2007; Haraksingh et al., 
20 1 1 ) . While this method has relatively low spatial resolution (typ- 
ically >5-10 Mb; Kallioniemi et al, 1993) and requires a relatively 
large volume of DNA, CGH does offer high sensitivity and speci- 
ficity (Greshock etal, 2007; Haraksingh etal., 2011), which is 
critical in a diagnostic context. 

Single nucleotide polymorphism (SNP) arrays are more com- 
monly used for CNV analysis, and CNVs can be identified from 



www.f rontiersin .org 



March 2014 | Volume 5 | Article 51 I 1 



Connolly etal. 



CNV analysis of eMERGE phenotypes 



Information for 
CNV Detection 



Platform 



^tensity Uniform (aCGH) 



lntensity+ 
Genotype 
lntensity(exon 
lntensity+ 
Pairs+Split 



II I II I I 



Mants f SNV/SNP 



CNV 



1 



Pre-Mapped 



Tag SNP Array 
Exome Sequencing n . %1 

° Require Mapping 

Whole Genome Sequencing 



FIGURE 1 | CNV detection using different platforms: platforms vary in their capacities to detect CNVs. 
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FIGURE 2 | CNV detection in SNP-array data using PennCNV: 
example log R ratio (LRR) and B Allele Freq (BAF) values for the 
chromosome 15 q-arm of an individual. Three normal chromosomal 
BAF genotype clusters (AA, AB, and BB genotypes) have LRR values 
around zero. The copy-neutral loss-of-heterozygosity (LOH) region has 



normal LRR values, but no AB cluster. Increased copy number can be 
observed in the increased number of peaks in the BAF distribution and 
increased LRR values. LRR and BAF patterns are different for different 
CNV regions, and can be used to generate CNV calls. Adapted from 

Wang etal. (2007). 
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standard GWA array signals, or from arrays that utilize custom 
probes. Custom probes offer greater coverage of non-SNP sites, 
and can offer high sensitivity, particularly with regard to break- 
point resolution (Haraksingh etal., 2011). While conventional 
(i.e., non-custom) SNP arrays offer less specificity, they never- 
theless represent a cost-effective option for characterizing CNVs 
and have been successfully applied to a wide range of phenotypes 
to date (Connolly and Hakonarson, 2012). 

Importantly, it is possible to retroactively characterize CNVs 
from existing genome-wide association study (GWAS) data. In 
this context, the observed SNP signal of an allele relative to 
the normalized intensity of the allele can be used to deduce a 
deletion (decreased intensity) or duplication (increased inten- 
sity; Glessner etal., 2012). This possibility constitutes a major 
opportunity for custodians of large biorepositories such as elec- 
tronic medical record and genomics (eMERGE), where a large 
volume of GWAS data has already been generated. Since its found- 
ing in 2007, the eMERGE consortium has produced dozens of 
GWASs on a range of phenotypes including lipids (Rasmussen- 
Torvik etal., 2012), arrhythmia (Ritchie etal., 2013), and white 
blood cell count (Crosslin etal., 2012) to name a few. For 
many of these phenotypes, no CNV studies have been pub- 
lished to date. This, we believe, represents an opportunity to 
identify new disease-associated loci without the generation of 
new genotype data, and will be addressed by the consortium 
in the immediate future. Similarly, we note that a large num- 
ber of studies listed in the NHGRI GWAS catalog 2 do not have 
complementary CNV data, suggesting a largely under-utilized 
resource. 

For array-based analyses, a range of packages are available. 
Both Affymetrix and Illumina - the two primary purveyors of 
SNP arrays - offer free software packages for CNV analysis. Inde- 
pendently developed toolsets are also available. These include 
circular binding segmentation (Olshen et al, 2004) MixHMM (Liu 
et al, 2010), GADA (Pique-Regi et al., 2008), PennCNV (Figure 2; 
Wang etal., 2007), and ParseCNV (Glessner etal, 2013a; the lat- 
ter two were developed by eMERGE researchers and are widely 
used). 

SEQUENCING-BASED APPROACHES 

Common CNVs are well-covered by SNPs in existing arrays 
(Conrad et al., 2010; Wellcome Trust Consortium etal., 2010). 
However, a resequencing study by Pang etal. (2010) suggests 
that coverage of rare CNVs may be less comprehensive. The 
authors identified over 12,000 structural variants in 4,867 genes 
across 40 + mb of sequence (the Venter genome), which had 
been initially unreported. More than 24% of these CNVs would 
not have been imputed by SNP-association. Given that rare 
alleles can have large effect sizes and a high penetrance, these 
results underline the limitations of SNP arrays to identify certain 
pathogenic CNVs. SGS, which is far more proficient at identi- 
fying rare CNVs, offers an attractive solution in this regard - 
particularly in identifying novel insertions absent in the refer- 
ence genome. This has obvious clinical utility. SGS also confers a 
number of other critical advantages in terms of ability to identify 



2 http :// www. genome.gov/ gwastudies/ 



smaller CNVs (<50 bp), and an enhanced capability for detect- 
ing breakpoints (Li and Olivier, 2013). Indeed, because SGS 
allows us to probe breakpoints at the level of base pairs, it facili- 
tates capture of the signature of potential mutational mechanisms 
(Li and Olivier, 2013). 

With SGS data, the most common methods for CNV iden- 
tification from short- read analysis (Medvedev etal., 2010) are 
read-depth analysis (Xie and Tammi, 2009; Yoon etal, 2009; 
Abyzov et al., 201 1 ), split-read mapping (Mills et al, 2006), paired- 
end read mapping (Korbel etal, 2009), and clone-based sequenc- 
ing (Kidd etal., 2008). For all approaches, the most important 
determinants of accuracy are alignment and read-length. The aver- 
age length of (reliable) reads is ~ from 100 to 150 bp, which 
is insufficient to eliminate erroneous mapping. As this metric 
improves, CNV-calling algorithms will become more accurate. 

A large number of algorithms have been developed for 
indentifying CNVs from sequencing data, including CNVnator 
(Abyzov etal., 2011), PennCNV-Seq (in press), GenomeStrip 
(Handsaker etal., 2011), cnvHiTSeq (Bellos etal, 2012), and 
XHMM (Fromer etal, 2012). Different CNV algorithms have 
different strengths and weaknesses (see Li and Olivier, 2013 for 
review), and the most effective strategy in terms of minimizing 
erroneous CNV calls is to incorporate multiple toolsets, which 
can be validated computationally via local de novo assembly (e.g., 
see SVMerge, Wong etal, 2010). 

DISEASE-ASSOCIATED COPY NUMBER VARIATIONS 

As discussed elsewhere in this issue, GWASs have been success- 
ful in identifying common risk variants, particularly where the 
frequency of such variants is >5%. In addition to common vari- 
ants, certain disorders have been shown to be enriched for rare 
CNVs (Conrad etal, 2010; Pang etal, 2010). In terms of func- 
tional impact, CNVs have been shown to be enriched in genes 
involved in immune responses, cell-cell signaling, and retrovirus- 
and transposition-related protein coding (Li and Olivier, 2013). A 
large number of phenotypes have now been associated with CNVs, 
including several rare diseases (Matsuura et al, 1997) and a range 
of neurodevelopmental disorders (Glessner etal, 2012), includ- 
ing depression (Glessner etal., 2010c), schizophrenia (Glessner 
etal, 2010b), and autism (Glessner etal, 2009). Autism pro- 
vides a particularly good example of how our understanding of 
genetic risk factors and etiology is enhanced by CNV research, 
as demonstrated by a recent exome sequencing study (Iossifov 
etal., 2012) involving 343 families from the Simons Simplex 
Collection. 

The study identified 59 "likely gene disruptions (LGDs)" in 
autism cases. Interestingly, the 59-strong LGD shared overlapped 
strongly with a set of 842 proteins that interact with the fragile 
X protein, FMRP. In total, 14 of the 59 LGDs encoded FMRP- 
interacting proteins (P = 0.006), as did 13 of 72 CNV candidates 
from the group's previous CNV paper (P = 0.0004). Thus, 26 of 
129 candidates were FMRP-related (P < 1 x 10 -13 ). These results 
mark the fragile X mental retardation 1 (FMR1) gene as a high- 
profile autism candidate. Screening upstream targets of FMR1, the 
same group identified a deletion in GRM5 that removes a single 
amino acid, causing an additional substitution at the same site. 
GRM5 encodes the glutamate receptor mGluR5 (Bear et al., 2004), 
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which has been proposed as translational target in both ASD and 
ADHD (Elia etal, 2012; Silverman etal., 2012). 

Several other CNV studies of autism have uncovered rare recur- 
rent CNVs that have been informative. Our laboratory recently 
identified a range of CNVs in two major gene networks, ubiq- 
uitins and neuronal cell adhesion molecules that predispose to 
autism (Glessner etal., 2009). The ubiquitin-proteasome system 
is known to operate at pre- and post-synapses, and mediate neu- 
rotransmitter release, recycling of synaptic vesicles in pre-synaptic 
terminals, and modulating changes in dendritic spines and post- 
synaptic density (Yi and Ehlers, 2005). Neuronal cell adhesion 
molecules contribute to neurodevelopment by facilitating axon 
guidance, synapse formation and plasticity, and neuron-glial 
interactions. 

Results from these and several other CNV studies suggest that 
genomic hotspots maybe particularly vulnerable, which for autism 
include loci on chromosomes Iq21,3p26, 15ql 1— ql3, 16pll,and 
22ql 1 (Bucan et al., 2009; Glessner et al, 2009; Pinto et al., 2010). 
Interestingly, these hotspots are part of large gene networks that 
are important to neural signaling and neurodevelopment, and 
have additionally been associated with other neuropsychiatric dis- 
orders. For example, studies of schizophrenia have highlighted 
structural mutations incorporating chromosomes lq21, 15ql3, 
and 22qll (Glessner etal, 2010b). From an etiological perspec- 
tive, autism and schizophrenia seem extremely different and it 
would seem counter-intuitive that associated loci should overlap. 
Some authors have addressed this peculiarity by proposing that 
the two disorders may in fact be opposite poles of the same spec- 
trum (Crespi and Badcock, 2008). While such propositions await 
confirmation, they do highlight the potential of CNV studies to 
generate new hypotheses about the nature of complex diseases. 
Although individual structural variants explain relatively little by 
way of genetic variance, their cumulative is likely to be consider- 
able. For autism, Marshall etal. (2008) suggested that CNVs play 
a causal role in 7% cases. 

Beyond neuropsychiatric diseases, CNV studies have been pub- 
lished across a range of disease types, including heart disease 
(Goldmuntz etal., 2011), obesity (Glessner etal, 2010a), and 
cancer (Kuusisto et al., 2013). They have also recently been impli- 
cated in altered lifespan through alternative splicing mechanism 
(Glessner etal., 2013b). 

COPY NUMBER VARIATIONS IN THE CONTEXT OF THE 
EMERGE CONSORTIUM 

As illustrated in Table 1, the eMERGE consortium bioreposi- 
tory includes ~60,000 individuals that have been genotyped on 
high-density GWA arrays 3 , all of which have been linked with 
electronic medical records (EMRs). The size and diversity of the 
repository is such that it invokes the possibility for deep mining 
of disease-associated variants across multiple phenotypes. It is 
inevitable that a reasonable proportion of these individuals have 
disease-associated CNVs, and a larger proportion may be carriers 
of structural variants in recessive disease genes. By systematically 
characterizing CNVs across the biorepository, we have a very obvi- 
ous opportunity to catalog CNVs and their disease-burden status. 



3 http://www.genome.gov/27540473 



We have now run PennCNV on eMERGE Phase I data (2007- 
2011), and will soon have circular binary segmentation analyses 
complete for the same set (50-kb to whole-chromosome). Rele- 
vant analyses will play a major role in the consortium's Phase II 
genomics program (2012-2015). 

Similarly, the eMERGE consortium recently embarked upon 
a large-scale pharmacogenomics project [n = ~9000, review at 
Rasmussen-Torvik etal. (2012) in this issue], featuring a tar- 
geted sequencing platform developed by the Pharmacogenomics 
Research Network (PGRN), and covering 84 genes considered 
important for drug-gene interactions 4 . While the primary pur- 
pose of this project is to screen for existing pathogenic variants, 
this does offer an important opportunity to probe for novel vari- 
ants in existing candidate genes, and to return results to patients' 
medical records. This clearly cannot be accomplished without 
paying heed to extensive medical, psychological, and ethical con- 
siderations, which are addressed elsewhere in this issue and in 
previous literature (Green etal., 2013). Assuming, however, that 
such considerations are adequately addressed, the section below 
considers how this might be accomplished and the potential to 
impact clinical care. 

INTEGRATING CNVs WITH MEDICAL RECORDS - WHAT ARE 
THE OBSTACLES? 

As discussed at length in this issue, the possibility of linking 
genomics data with EMRs represents a potentially major health- 
care opportunity. What variants/results and how to report them 
remains open to debate, and indeed part of the remit of the 
eMERGE consortium is to think through these hurdles. 

An obvious first step is determining the pathogenicity of rel- 
evant CNVs. Traditionally (e.g., cytogenetics), interpretation of 
CNVs has concentrated on diseases where the mode of inheritance 
was dominant, and relied on simple case-control comparisons to 
discriminate pathogenic from non-pathogenic variations. Where 
the CNV was common (i.e., frequency >l-5%), it was typi- 
cally classed as non-pathogenic. Thus, by process, "rare" implied 
"pathogenic." With SGS and the increased capacity to detect 
smaller CNVs, this assumption falls down to a certain extent. 
We have started to see numerous studies where control and 
case de novo rate of small CNVs is as high as 5-10%. For 
rare CNVs in complex diseases, there is often insufficient power 
on which to base a judgment. Public databases that catalog 
pathogenic and non-pathogenic CNVs are therefore critical to 
determining frequencies of CNVs in disease cases and healthy 
controls. 

Perhaps the most widely used catalog is the DGV, which aims 
to provide a "comprehensive summary of structural variation in 
the human genome" based on peer-review of relevant studies. 
While the DGV has obvious clinical and research relevance, sev- 
eral recent commentaries (Duclos etal., 2011; Hehir-Kwa etal., 
2013) have urged caution in relying too heavily on its frequency 
and mapping statistics. As highlighted by Lee etal. (2007), many 
CNVs in the DGV are derived from single platforms/technologies, 
which may not necessarily translate to alternate approaches. Sev- 
eral recent studies (Perry etal., 2008; Conrad etal., 2010) suggest 
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Table 1 I Summary of biorepositories and electronic medical records (EMRs) at 10 eMERGE-lnstitutions. Adapted from Gottesman et al. (2013). 



Institution 


Biorepository 


Recruitment model 


Biorepository size 


Race/ethnicity and age of donors 


Boston Children's 


Gene Partnership 


Outpatient and 


3,372 


83% European 9% African 6% Asian 


Hospital 




hospital-based 




11% Hispanic/Latino Mean age: 










23 years 


Children's Hospital 


A Study of the Genetic Causes of 


Population-based 


60,000 internal (plus 


47.0% European 43.3% African 7.0% 


of Philadelphia 


Complex Pediatric Disorders 


and disease-specific 


100,000 external) 


Admixed 1.7% Asian 0.8% Hispanic 










0.2% Native Amer. Mean age: 11 years 


Cincinnati Children's 


Better Outcomes for Children 


Outpatient and 


8,472 


73% European 10% African Mean age: 


Hospital 




hospital-based 




9 years 


Geisinger Clinic 


MyCode® 


Population-based 


35,000 


98% European Age: < 89 years 






and disease-specific 






Group Health Seattle 


ACT Study; Alzheimer's Disease 


Disease-specific and 


5,859 


92% European Age: > 50 years 




Patient Registry (ADPR); Northwest 


HMO-based 








Institute of Genetic Medicine (NWIGM) 








Marshfield Clinic 


Personalized Medicine Research 


Population-based 


20,000 


98% European Mean age: 48 years 


Research Foundation 


Project 








Mayo Clinic 


Vascular disease biorepository (VDB); 


Outpatient-based 


36,000 


97% European Mean age: 63 years 




Mayo Clinic Biobank; other 










disease-specific 








Mount Sinai School 


BioMe™,The Charles Bronfman 


Outpatient and 


25,000 


40% Hispanic/Latino 25% African 
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Biobank Program 








Northwestern 


NUgene 


Outpatient and 


12,000 


9% Hispanic/Latino 12% African 


University 




hospital-based 




78% European Mean age: 48 years 


Vanderbilt University 


BioVU 


Outpatient and 


155,000 


2% Hispanic/Latino 15% African 






hospital-based 




80% European Mean age: 49 years 



that because of relatively low resolution in some studies, the size 
of relevant CNVs may be smaller than outlined in the DGV. 
Duclos et al. (2011) drew similar conclusions, stressing the "urgent 
need to validate the frequencies and boundaries of the CNVs 
recorded in the DGV." This conclusion is based on the groups 
finding that some of the recorded CNVs are erroneously listed as 
polymorphic, which, if implemented in a medical setting may 
led to a deleterious CNV being called benign. Alternate CNV 
databases (e.g., dbVar; Lappalainen etal, 2013) have been estab- 
lished, but all are restrained by the quality of data on which they are 
based. 

Other obstacles that have hampered development of CNV 
databases are inconsistent annotation of genomic data across 
studies, ill-defined curation protocols (e.g., QC-reporting, CNV- 
calling parameters), and incomplete phenotypic data. In each 
case, there is potential for consortium-led efforts to delineate 
best practices. To address the challenge of incomplete pheno- 
types, there is a particular opportunity for the eMERGE network. 
The majority of individuals enrolled in the eMERGE repos- 
itory have their longitudinal EMRs linked to their genotype. 
This affords far greater potential for determining pathogenic- 
ity than traditional case-control studies, where controls may be 



categorized as lacking a specific disease state, with no other phe- 
notype data. Completeness-of-EMR is critical in this regard. For 
patients enrolled in the biorepository at The Children's Hospi- 
tal of Philadelphia, the mean duration of EMRs is ~5.5 years, 
and is similar across other eMERGE sites. Relevant data include 
all ICD-9 diagnoses, lab values, procedures, and medications. 
Data of this length and depth should be considered minimal 
requirements for addressing pathogenicity on a large scale, while 
supplementation with disease-specific measures is also highly 
desirable. 

Another major challenge in returning CNV data to patients' 
EMR concerns the nature of inheritance. An interesting study 
by Boone etal. (2013) recently sought to determine the rate of 
CNVs in recessive disease genes. The group used CGH to char- 
acterize deletion CNVs in 21,470 individual, identifying 3,212 
heterozygous potential carrier deletions in 419 unique disease- 
associated genes. While many of these CNVs are likely benign 
polymorphisms, the group identified 206 heterozygous CNVs 
in multiple recessive genes, spanning 2-6 genes in each dele- 
tion. These CNVs, therefore, confer carrier status for multiple 
recessive conditions. Similarly, 307 individuals had multiple dele- 
tions in recessive disease genes. While many of these gene pairs 
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have unrelated function, a non-trivial proportion belongs to a 
shared pathway. Indeed, one participant had a CNV spanning 
three recessive immune genes PSMB8, TAP1, and TAP2, which 
are associated with autoinfiammation, lipodystrophy, dermato- 
sis syndrome (PSMB8), and type I bare lymphocyte syndrome 
(TAP1 and TAP2). He also had a CNV in CD 19, mutations of 
which are associated with common variable immunodeficiency. 
The authors were unable to determine whether the individual had 
a compromised immune system or presented with a history of 
immune disease (samples were anonymized). Nevertheless, he was 
clearly a multiple-deletion carrier, as were ~1.5% of the cohort: 
such information may be of direct clinical relevance to individ- 
uals' offspring - whether this should be shared remains open to 
debate. 

Inherited CNVs pose a similar set of problems. While the 
majority of inherited CNVs may be in loci that lead to reces- 
sive disorders, this is not always the case. Indeed, one of the 
best-known CNVs is duplication at 15qll-ql3, which accounts 
for up to 3% of autism cases (Sebat etal., 2007; Marshall etal., 
2008). A complex scenario was recently described by Knijnen- 
burg etal. (2009), where a child with a homozygous deletion in 
15ql3.3 (inherited from non-consanguineous, hemizygous car- 
rier parents), resulted in hearing loss. Critically, if the CNV is a 
gain, three copies may have no phenotypic effect but four copies 
may have clinical consequences (Giorda etal, 2011). Conversely, 
when one parent carries a CNV loss in a recessive disease gene 
and the other parent carries a mutation in the same gene, this 
can result in compound heterozygosity in offspring (Hehir-Kwa 
etal., 2013; Paciorkowski etal., 2013). These findings stress the 
point that not only is the size, location, and direction of the 
CNV important, but so too is the number of copies. A range 
of other inheritance scenarios are reviewed by Hehir-Kwa etal. 
(2013), including X-linked CNVs (wide vary widely across indi- 
viduals), and mosaic imbalances (Kousoulidou etal, 2013; may 
vary across an individual's cell types; Biesecker and Spinner, 2013; 
Forsberg etal, 2013). 

Another point concerning CNV interpretation is the phe- 
nomenon of pleiotropy. As discussed above, a large proportion of 
reported recurrent CNVs have replicated across diseases (Cooper 
etal, 2011; Girirajan etal., 2011; Sahoo etal, 2011; Williams 
etal., 2011). Thus, the same microduplications at lq21.1 have 
been associated with both autism and schizophrenia (Weiss 
etal, 2008; McCarthy etal, 2009). Relevant factors influenc- 
ing the expressivity of this microduplication are a combination 
of environmental, epigenetic, and oligogenic (other modifier 
genes; Girirajan etal., 2010) factors. The precise mechanisms 
of causality that lead to a particular etiology are thus likely to 
be extremely complex, which calls into question what, if any- 
thing, might be reported in patients' EMRs. Such questions are 
the subject of ongoing debate (Fabsitz etal., 2010; Cassa etal., 
2012), and are beyond the scope of this review. However, it 
is obvious that as genomic data becomes increasingly ubiqui- 
tous, we will require extensive guidelines in determining how 
CNV results should be interpreted and shared. For the same 
reason, it is critical that healthcare professionals receive ade- 
quate training and resources to understand and communicate test 
results. 



Additionally, due to large numbers of cell divisions, CNVs, par- 
ticularly deletions, can be acquired in the hematogenic progenitor 
cells. We have previously shown that acquired mosaicism increases 
with age and can be associated with hematological disorders 
(Laurie etal., 2012; Schick etal., 2013). However, when analyz- 
ing CNVs associated with neurological disorders, such acquired 
CNVs must be distinguished from germline mutations that are 
represented in non-hematological tissues, such as brain. 

CONCLUSION 

To date, a large number of diseases, across a large range of 
fields, have been associated with CNVs. We are still in our rel- 
ative infancy in terms of deciding-upon the pathogenicity of such 
structural variants. We have stressed the need for a large, publicly 
accessible, and curated repository where CNVs that have been val- 
idated across platforms and technologies are stored. Whether this 
repository stems from improving existing catalogs or is developed 
ab initio remains to be determined, but the necessity of such a 
resource is compelling. Several eMERGE-led projects could fun- 
nel directly into such a repository, which would have real potential 
to impact healthcare. 

A number of obstacles have stymied result-sharing - difficulties 
identifying CNVs (particularly in regions enriched for repetitive 
content), a shortage of standards, and the nature of CNV disease 
burden. These problems have attracted much attention in the past 
several years, and are well-characterized. While there is general 
agreement that such obstacles are substantial, there is a similar 
degree of optimism that benefits to be derived from solving these 
problems far outweigh the costs required. Again, consortium-led 
initiatives will likely be the most effective platforms for standardiz- 
ing CNV-calling algorithms and developing guidelines for clinical 
care. The time is ripe for such initiatives, and we expect to see 
CNV-driven research make a major impact in clinical care in the 
next decade. 
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